-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make mapv_into_any() work for ArcArray, resolves #1280 #1283
base: master
Are you sure you want to change the base?
Conversation
e8a73e6
to
2ac3c9f
Compare
tests/array.rs
Outdated
#[test] | ||
fn mapv_into_any_diff_types() { | ||
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]]; | ||
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]]; | ||
assert_eq!(a.mapv_into_any(|a| a.round() as i32 % 2 == 0), a_even); | ||
let b: Array<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0); | ||
assert_eq!(b, a_even); | ||
} | ||
|
||
#[test] | ||
fn mapv_into_any_arcarray_same_type() { | ||
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared(); | ||
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]]; | ||
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.); | ||
assert_eq!(b, a_plus_one); | ||
} | ||
|
||
#[test] | ||
fn mapv_into_any_arcarray_diff_types() { | ||
let a: ArcArray<f64, _> = array![[1., 2., 3.], [4., 5., 6.]].into_shared(); | ||
let a_even: Array<bool, _> = array![[false, true, false], [true, false, true]]; | ||
let b: ArcArray<_, _> = a.mapv_into_any(|a| a.round() as i32 % 2 == 0); | ||
assert_eq!(b, a_even); | ||
} | ||
|
||
#[test] | ||
fn mapv_into_any_diff_outer_types() { | ||
let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]]; | ||
let a_plus_one: Array<f64, _> = array![[2., 3., 4.], [5., 6., 7.]]; | ||
let b: ArcArray<_, _> = a.mapv_into_any(|a| a + 1.); | ||
assert_eq!(b, a_plus_one); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a test here that looks into ArcArray
--> Array
? That functionality is the original behavior of the function and should remain (or else we'd have to mark as an even greater breaking change). Adding other various tests from the combinatorial set would also be good, e.g., ArcArray<i32>
--> Array<f32>
or some such.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akern40, it's great to see ndarray getting some love! Looking at this with fresh eyes after a year, I realized the implementation can be further simplified to avoid breaking changes. The implementation is similar to the original version of mapv_into_any(). The trait bounds have been rewritten to be generic over the input memory representation (e.g. OwnedRepr vs OwnedArcRepr), but the output is constrained to have the same memory representation as the input. This addresses the issue in #1280 without requiring any type hinting.
I have added some additional doctests and removed the unit test converting between Array <--> ArcArray.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benkay86 this is awesome! I'm a big fan of the new implementation, I think it's clean. I'm going to close out the other threads in this review; I just had one more request about changing the assert
to debug_assert
(unless you can think of places where the types and traits are satisfied but the assert
fails).
44deee5
to
fcc9967
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I like that this approach is consistent with other places in the crate where we output an array of the same type, and the trait bounds appear to be simpler, too.
src/impl_methods.rs
Outdated
// have the same kind of memory representation (OwnedRepr vs | ||
// OwnedArcRepr), then their memory representations should be the | ||
// same type, e.g. OwnedRepr<A> == OwnedRepr<B> | ||
assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<S as RawDataSubst<B>>::Output>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we change this to a debug_assert
? My impression is that this checks a condition that we think the type system should be ensuring, but we'd like the assert just to be sure; in other words, it's a little mini test case. Unless there are conditions in which the types and trait bounds are satisfied but this assert
is false?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are 100% correct! The type equality should always be satisfied -- the assertions are just a little sanity check around the unsafe code. We totally can change this to a debug_assert!()
. The use of debug_assert!()
appears to be consistent with the rest of the ndarray codebase.
I will point out that the compiler nominally optimizes away the assertion if it can prove the types are equal after monomorphization. For example, comment out line 6 of this example -- the assembly output does not change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woah! Had no idea, but I suppose that makes sense - I do think it just expands into a conditional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead and changes the assert!()
to a debug_assert!()
as we discussed. Is there anything else you would like to see changed for this PR to be merged?
@benkay86 I was doing some thinking about this method not just in the context of concrete types but also in the context of using it in a method. Let's take the following example that's a play on your tests: a function that uses fn is_modulo<S, D>(arr: ArrayBase<S, D>, modulo: i32) -> Array<bool, D> while under this new version the signature is fn is_modulo<S, D>(arr: ArrayBase<S, D>, modulo: i32) -> ArrayBase<<S as RawDataSubst<bool>>::Output, D> Now, the verbosity is one thing - sometimes we need verbose types, we shouldn't always be afraid of them. My bigger concern is that we've actually erased the trait bounds that we originally had on Sorry I don't have a solution to this, at least not yet, and I've got some other things I'm trying to pull together around the repo. Happy to keep the conversation up, though - maybe I'm missing something? Or if you want to bounce ideas / solutions around. Really appreciate your persistence on this PR. |
I appreciate your help thinking about these trait bounds. The combination of fancy trait bounds + unsafe code can be difficult to reason about! With regard to the verbosity in the return value of your function signature, there may be ways to modify I don't think your second concern about "erasing" trait bounds is valid. (And if we could erase them... would that be a bad thing?) The trait bounds on I have added a test case with If you want to go in a totally different direction, I could try to rewrite |
Sorry I should've been a little clearer when I was talking about "erasing" trait bounds. I'm worried about the output, not the input.
This is my point: we know that, but the compiler doesn't. (EDITED, for a better example) Let's say I use fn parity<S, D>(arr: ArrayBase<S, D>, modulo: i32) -> bool
where --snip-- // Trait bounds that enable `mapv_into_any` and modulo, but I'm too lazy to write right now
{
let output = arr.mapv_into_any(|a| a.round() as i32 % 2 == 0);
// I can only call a limited set of functions on `output`, those that only have the `RawData` bound.
// So I can't map, can't sum, can't take the first or last, can't view, etc
} Idk, maybe I'm being ridiculous. |
Ah, I see what you mean now. We could, without loss of generality for extant types in ndarray, add the following additional trait bounds to pub fn mapv_into_any<B, F>(self, mut f: F) -> ArrayBase<<S as RawDataSubst<B>>::Output, D>
where
// ...
<S as ndarray::RawDataSubst<i32>>::Output: DataOwned + DataMut,
{
// ...
} However, it is not actually necessary to place this trait bound in the definition of use num_traits;
fn round<S, A, D>(arr: ArrayBase<S, D>) -> ArrayBase<<S as ndarray::RawDataSubst<i32>>::Output, D>
where
S: ndarray::DataMut<Elem = A> + ndarray::RawDataSubst<i32> + 'static,
A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> + 'static,
D: Dimension,
ArrayBase<<S as ndarray::RawDataSubst<i32>>::Output, D>: From<Array<i32, D>>,
{
arr.mapv_into_any(|a| a.round().as_())
}
#[test]
fn test_round() {
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_rounded: Array<i32, _> = array![1, 3, 3];
let b = round(a);
assert_eq!(b, a_rounded);
} Now let's try something like your fn round_sum<S, A, D>(arr: ArrayBase<S, D>) -> i32
where
S: ndarray::DataMut<Elem = A> + ndarray::RawDataSubst<i32> + 'static,
A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32> + 'static,
D: Dimension,
ArrayBase<<S as ndarray::RawDataSubst<i32>>::Output, D>: From<Array<i32, D>>,
<S as ndarray::RawDataSubst<i32>>::Output: ndarray::Data,
{
let output = arr.mapv_into_any(|a| a.round().as_());
output.sum()
}
#[test]
fn test_round_sum() {
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let b = round_sum(a);
assert_eq!(b, 7);
} As for the |
Might be time for me to sleep! Really excellent analysis here. I'm in favor of adding that bound to the function, as that would provide clearer error messages to users who may otherwise be confused as to why they can't call additional functions. I need to go to bed, but my last consideration now is the verbosity; let's put a bit of thought into whether there's an obvious path forward on lowering it. Happy to think about it this weekend a bit, or if you've got the time and inclination you can throw a design out. (P.S. Noticed you're at WashU! I'd say "Go Bears" or something but god knows we weren't a big sports school when I was there, can't imagine that's changed. Hope the semester is going well!) |
OK, I've been playing around with the ergonomics. Here's one approach using a helper trait with (stable) generalized associated types to cut down on the trait bounds. The helper trait is something like: pub trait MappableData: RawData + 'static {
type Subst<B>: Data<Elem = B> + DataOwned;
fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D>;
}
impl<A: 'static> MappableData for OwnedRepr<A> {
type Subst<B> = OwnedRepr<B>;
fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D>
{
arr
}
}
impl<A: 'static> MappableData for OwnedArcRepr<A> {
type Subst<B> = OwnedArcRepr<B>;
fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D>
{
arr.into()
}
} The implementation of impl<A, S, D> ArrayBase<S, D>
where
S: DataMut<Elem = A> + MappableData,
D: Dimension,
A: Clone + 'static,
{
pub fn mapv_into_any<B, F>(self, mut f: F) -> ArrayBase<<S as MappableData>::Subst<B>, D>
where
F: FnMut(A) -> B,
B: 'static,
{
if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>() {
let f = |a| {
let b = f(a);
unsafe { unlimited_transmute::<B, A>(b) }
};
let output = self.mapv_into(f);
debug_assert!(core::any::TypeId::of::<S>() == core::any::TypeId::of::<<S as MappableData>::Subst<B>>());
unsafe { unlimited_transmute::<ArrayBase<S, D>, ArrayBase<<S as MappableData>::Subst<B>, D>>(output) }
} else {
<S as MappableData>::from_owned_repr::<B,D>(self.mapv(f))
}
}
} Now we can have more ergonomic calls. For concrete types: let a: Array<f64, _> = array![[1., 2., 3.], [4., 5., 6.]];
let a_even = a.mapv_into_any(|a| a.round() as i32 % 2 == 0) Function with generic element type but concrete data representation: fn round<A, D>(arr: Array<A, D>) -> Array<i32, D>
where
A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,
D: Dimension,
{
arr.mapv_into_any(|a| a.round().as_())
}
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_rounded = round(a); Function generic over element and data representation: fn round<A, S, D>(arr: ArrayBase<S, D>) -> ArrayBase<<S as MappableData>::Subst<i32>, D>
where
S: ndarray::DataMut<Elem = A> + MappableData,
A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,
D: Dimension,
{
arr.mapv_into_any(|a| a.round().as_())
}
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_rounded = round(a); Generic function operating on the (generic) output of fn round_sum<S, A, D>(arr: ArrayBase<S, D>) -> i32
where
S: ndarray::DataMut<Elem = A> + MappableData,
A: num_traits::real::Real + num_traits::cast::AsPrimitive<i32>,
D: Dimension,
{
let output = arr.mapv_into_any(|a| a.round().as_());
output.sum()
}
let a: Array<f32, _> = array![1.1, 2.6, 3.0];
let a_sum = round_sum(a); Happy to let these ideas percolate a bit more, but let me know what you think. If we go this route then we'll have to decide on what to call the helper trait
Ha, I am always running into WashU alumni! What program were you in here? I'm a physician/scientist faculty using Rust to power my neuroimaging research. |
I think this is a great design. I want to suggest a slight twist inspired by yours that I think balances between ergonomics for the user and maintenance / fragility for the library (and suggest an alternative name):
pub trait DataSubst<A>: RawDataSubst<A> + DataOwned
where Self::Output: DataOwned
{
fn from_owned<D: Dimension>(self_: Array<A, D>) -> ArrayBase<Self::Output, D>;
} I like the name Unfortunately, this isn't quite as succinct for users as yours, but it has the advantage of not repeating the associated types within the library, which I like. The definition of pub fn mapv_into_any<B, F>(self, mut f: F) -> ArrayBase<<S as RawDataSubst<B>>::Output, D>
where
S: DataMut + DataSubst<B, Output: DataOwned>,
F: FnMut(A) -> B,
A: Clone + 'static,
B: 'static,
{
if core::any::TypeId::of::<A>() == core::any::TypeId::of::<B>() {
// A and B are the same type.
// Wrap f in a closure of type FnMut(A) -> A .
let f = |a| {
let b = f(a);
// Safe because A and B are the same type.
unsafe { unlimited_transmute::<B, A>(b) }
};
// Delegate to mapv_into() using the wrapped closure.
// Convert output to a uniquely owned array of type Array<A, D>.
let output = self.mapv_into(f).into_owned();
// Change the return type from Array<A, D> to Array<B, D>.
// Again, safe because A and B are the same type.
unsafe { unlimited_transmute::<Array<A, D>, ArrayBase<<S as RawDataSubst<B>>::Output, D>>(output) }
} else {
// A and B are not the same type.
// Fallback to mapv().
S::from_owned(self.mapv(f))
}
} Rust won't directly suggest the clean Thoughts?
I did my computer science degrees there! Exciting to see this library being used for research-level neuroscience. |
Oh and I don't think you need the |
It's only needed if you want the extra
I considered this possibility since the trait is, conceptually, a natural extension of impl <A, S, D> ArrayBase<S, D>
where
S: DataMut<Elem=A> + MappableData Instead you must introduce the constraint at the definition of pub fn mapv_into_any<B, F>(self, mut f: F> -> ArrayBase<<S as RawDataSubst<B>>::Output, D>
where
S: DataMut<Elem = A> + DataSubst<B, Output: DataOwned> But now every function calling Unfortunately, the GAT version of
I'm wondering if |
I recently got Miri support working as part of our CI/CD, so it actually will run Miri the next time you push!
This is true, but unfortunately I think it's trickier than that: implementing impl<'a, A: 'a> RawDataSubst for ViewRepr<&'a A>
{
type Output<B> = ViewRepr<&'a B>; // Problem occurs here
unsafe fn data_subst<B>(self) -> Self::Output<B>
{
ViewRepr::new()
}
} You can't write
Ya, like I said, I was trying to find a middle ground. It's still significantly more succinct than the implementation without the additional trait, but it isn't quite as terse as the
No reason it shouldn't, especially if it means a quick |
To make pub trait MappableData: RawData
{
type Subst<B>: Data<Elem = B> + DataOwned where for<'b> B: 'b;
fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D> where for<'b> B: 'b;
}
// omitting other impls for brevity
impl<'a, A> MappableData for CowRepr<'a, A>
{
type Subst<B> = CowRepr<'a, B> where for<'b> B: 'b;
fn from_owned_repr<B, D: Dimension>(arr: Array<B,D>) -> ArrayBase<Self::Subst<B>,D> where for<'b> B: 'b
{
arr.into()
}
} EDIT It looks like the HRB To your broader points about balancing maintainability with ergonomics, my counterargument is:
The final choice is yours. If you still prefer the |
Thanks for giving me time to think about this! After a bit of experimenting this morning, I like your approach. I'd like to propose a few changes:
/// An array representation trait for mapping to an owned array with a different element type.
///
/// Functions such as [`mapv_into_any`](ArrayBase::mapv_into_any) that alter an array's
/// underlying element type often want to preserve the storage type (e.g., `ArcArray`)
/// of the original array. However, because Rust considers `OwnedRepr<A>` and `OwnedRepr<B>`
/// to be completely different types, a trait must be used to indicate what the mapping is.
///
/// This trait will map owning storage types to the element-swapped version of themselves;
/// view types are mapped to `OwnedRepr`. The following table summarizes the mappings:
///
/// | Original Storage Type | Corresponding Array Type | Mapped Storage Type | Output of `from_owned` |
/// | ----------------------- | ------------------------ | ------------------- | ---------------------- |
/// | `OwnedRepr<A>` | `Array<A, D>` | `OwnedRepr<B>` | `Array<B, D>` |
/// | `OwnedArcRepr<A>` | `ArcArray<A, D>` | `OwnedArcRepr<B>` | `ArcArray<B, D>` |
/// | `CowRepr<'a, A>` | `CowArray<'a, A, D>` | `CowRepr<'a, B>` | `CowArray<'a, B, D>` |
/// | `ViewRepr<&'a A>` | `ArrayView<'a, A, D>` | `OwnedRepr<B>` | `Array<B, D>` |
/// | `RawViewRepr<*const A>` | `RawArrayView<A, D>` | `OwnedRepr<B>` | `Array<B, D>` |
pub trait DataMappable: RawData
{
/// The element-swapped, owning storage representation
type Subst<B>: Data<Elem = B> + DataOwned;
/// Cheaply convert an owned [`Array<B, D>`] to a different owning array type, as dictated by `Subst`.
///
/// The owning arrays implement `From`, which is the preferred method for changing the underlying storage.
/// This method (and trait) should be reserved for dealing with changing the element type.
fn from_owned<B, D: Dimension>(self_: Array<B, D>) -> ArrayBase<Self::Subst<B>, D>;
} |
If you can put |
Thank you for your patience. I have finally had a chance to work on this. Please let me know what you think.
|
@benkay86 now it's my turn to thank you - I had to get our BLAS CI/CD situation handled before I could go back to other PRs. I like all of your suggestions - let's leave the You've done tremendous work here. If you would like me to cover the last mile I'd be happy to make the necessary changes. |
Thank you for kind words @akern40! I am happy to make the requested changes unless there is some tight release deadline that I am getting in the way of. After having (yet more) time to think about this, I have two thoughts I want to run by you. Number OneIt turns out there are two roughly-equally-efficient ways to implement The other implementation is to call mapv_into() on the input data representation directly. Then it transmutes from Do you have a preference between these two implementation strategies? Number TwoAs a final point for thought, I wonder if this PR should be merged at all, or if we should mark the original issue #1280 as won't fix. @RReverser never did give us an example of where he thought returning an
In fact, the existing implementation of let output = self.mapv_into(f).into_owned(); For an Pros of this PR: obviously the current situation is confusing to users, and this PR creates some syntactic/generic sugar to make Cons: Rust struggles with trait solving and generic return types in particular. Adding a generic return type and the |
Note this is a breaking change that will require users of
mapv_into_any()
to tell the compiler, by annotating the return type, whether the returned array should be anArray
or anArcArray
.